The World Wide Web as a Resource for Example-Based Machine Translation Tasks

نویسنده

  • Gregory Grefenstette
چکیده

, The WWW is two orders of magnitude larger than the largest corpora. Although noisy, web text presents language as it is used, and statistics derived from the Web can have practical uses in many NLP applications. For this reason, the WWW should be seen and studied as any other computationally available linguistic resource. In this article, we illustrate this by showing that an Example-Based approach to lexical choice for machine translation can use the Web as an adequate and free resource.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Very Large Lexical Databases: An ACL Tutorial

The WWW is two orders of magnitude larger than the largest corpora. Although noisy, web textpresents language as it is used, and statistics derived from the Web can have practical uses in many NLPapplications. For this reason, the WWW should be seen and studied as any other computationally availablelinguistic resource. In this article, we illustrate this by showing that an Example−B...

متن کامل

wEBMT: Developing and Validating an Example-Based Machine Translation System using the World Wide Web

We have developed an example-based machine translation (EBMT) system that uses the World Wide Web for two different purposes: First, we populate the system’s memory with translations gathered from rule-based MT systems located on the Web. The source strings input to these systems were extracted automatically from an extremely small subset of the rule types in the PennII Treebank. In subsequent ...

متن کامل

Saffron a Prototype Example for Evidence Based Herbal Medicine

Evidence-based medicine is now generally perceived to be the dominant operating system in conventional medicine. Evidence-based medicine developed concurrently with the internet and the world wide web. This is no coincidence since evidence-based medicine suggests a personal responsibility for clinicians to keep abreast of research that would be difficult without the information access that the ...

متن کامل

The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks

Previous work demonstrated that web counts can be used to approximate bigram frequencies, and thus should be useful for a wide variety of NLP tasks. So far, only two generation tasks (candidate selection for machine translation and confusion-set disambiguation) have been tested using web-scale data sets. The present paper investigates if these results generalize to tasks covering both syntax an...

متن کامل

BITS: A Method for Bilingual Text Search over the Web

Parallel corpus are valuable resource for machine translation, multilingual text retrieval, language education and other applications, but for various reasons, its availability is very limited at present. Noticed that the World Wide Web is a potential source to mine parallel text, researchers are making their efforts to explore the Web in order to get a big collection of bitext. This paper pres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000